1,228 research outputs found
Recommended from our members
Auditory Spectrum-Based Pitched Instrument Onset Detection
In this paper, a method for onset detection of music signals using auditory spectra is proposed. The auditory spectrogram provides a time-frequency representation that employs a sound processing model resembling the human auditory system. Recent work on onset detection employs DFT-based features describing spectral energy and phase differences, as well as pitch-based features. These features are often combined for maximizing detection performance. Here, the spectral flux and phase slope features are derived in the auditory framework and a novel fundamental frequency estimation algorithm based on auditory spectra is introduced. An onset detection algorithm is proposed, which processes and combines the aforementioned features at the decision level. Experiments are conducted on a dataset covering 11 pitched instrument types, consisting of 1829 onsets in total. Results indicate that auditory representations outperform various state-of-the-art approaches, with the onset detection algorithm reaching an F-measure of 82.6%
Expressive visual text to speech and expression adaptation using deep neural networks
In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus. Furthermore, we present a method of adapting a previously trained DNN to include a new expression using a small amount of training data. Experiments show that the proposed DNN-based VTTS is preferred by 57.9% over the baseline hidden Markov model based VTTS which uses cluster adaptive training
Tensin1 expression and function in chronic obstructive pulmonary disease
open access articleChronic obstructive pulmonary disease (COPD) constitutes a major cause of morbidity and
mortality. Genome wide association studies have shown significant associations between airflow
obstruction or COPD with a non-synonymous SNP in the TNS1 gene, which encodes tensin1.
However, the expression, cellular distribution and function of tensin1 in human airway tissue and
cells are unknown. We therefore examined these characteristics in tissue and cells from controls
and people with COPD or asthma.
Airway tissue was immunostained for tensin1. Tensin1 expression in cultured human
airway smooth muscle cells (HASMCs) was evaluated using qRT-PCR, western blotting and
immunofluorescent staining. siRNAs were used to downregulate tensin1 expression.
Tensin1 expression was increased in the airway smooth muscle and lamina propria in COPD
tissue, but not asthma, when compared to controls. Tensin1 was expressed in HASMCs and
upregulated by TGFβ1. TGFβ1 and fibronectin increased the localisation of tensin1 to fibrillar
adhesions. Tensin1 and α-smooth muscle actin (αSMA) were strongly co-localised, and tensin1
depletion in HASMCs attenuated both αSMA expression and contraction of collagen gels.
In summary, tensin1 expression is increased in COPD airways, and may promote airway
obstruction by enhancing the expression of contractile proteins and their localisation to stress
fibres in HASMCs
Bird detection in audio : a survey and a challenge
Many biological monitoring projects rely on acoustic detection of birds. Despite increasingly large datasets, this detection is often manual or semi-automatic, requiring manual tuning/postprocessing. We review the state of the art in automatic bird sound detection, and identify a widespread need for tuning-free and species-agnostic approaches. We introduce new datasets and an IEEE research challenge to address this need, to make possible the development of fully automatic algorithms for bird sound detection
Robust excitation-based features for Automatic Speech Recognition
In this paper we investigate the use of robust to noise features characterizing the speech excitation signal as complementary features to the usually considered vocal tract based features for automatic speech recognition (ASR). The features are tested in a state-of-the-art Deep Neural Network (DNN) based hybrid acoustic model for speech recognition. The suggested excitation features expands the set of excitation features previously considered for ASR, expecting that these features help in a better discrimination of the broad phonetic classes (e.g., fricatives, nasal, vowels, etc.). Relative improvements in the word error rate are observed in the AMI meeting transcription system with greater gains (about 5%) if PLP features are combined with the suggested excitation features. For Aurora 4, significant improvements are observed as well. Combining the suggested excitation features with filter banks, a word error rate of 9.96% is achieved.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717885
Dysphonia Detection based on modulation spectral features and cepstral coefficients
In this paper, we combine modulation spectral features with mel-frequency cepstral coefficients for automatic detection of dysphonia. For classification purposes, dimensions of the original modulation spectra are reduced using higher order singular value decomposition (HOSVD). Most relevant features are selected based on their mutual information to discrimination between normophonic and dysphonic speakers made by experts. Features that highly correlate with voice alterations are associated then with a support vector machine (SVM) classifier to provide an automatic decision. Recognition experiments using two different databases suggest that the system provides complementary information to the standard mel-cepstral feature
A fixed dimension and perceptually based dynamic sinusoidal model of speech
This paper presents a fixed- and low-dimensional, perceptually based dynamic sinusoidal model of speech referred to as PDM (Perceptual Dynamic Model). To decrease and fix the number of sinusoidal components typically used in the standard sinusoidal model, we propose to use only one dynamic sinusoidal component per critical band. For each band, the sinusoid with the maximum spectral amplitude is selected and associated with the centre frequency of that critical band. The model is expanded at low frequencies by incorporating sinusoids at the boundaries of the corresponding bands while at the higher frequencies a modulated noise component is used. A listening test is conducted to compare speech reconstructed with PDM and state-of-the-art models of speech, where all models are constrained to use an equal number of parameters. The results show that PDM is clearly preferred in terms of quality over the other systems. Index Terms — Sinusoidal Model, Critical band, Vocoder 1
Selective CO₂ capture in metal-organic frameworks with azine-functionalized pores generated by mechanosynthesis
Two new three-dimensional porous Zn(II)-based metal-organic frameworks, containing azine-functionalized pores, have been readily and quickly isolated via mechanosynthesis, by using a nonlinear dicarboxylate and linear N-donor ligands. The use of nonfunctionalized and methyl-functionalized N-donor ligands has led to the formation of frameworks with different topologies and metal-ligand connectivities and therefore different pore sizes and accessible volumes. Despite this, both metal-organic frameworks (MOFs) possess comparable BET surface areas and CO₂ uptakes at 273 and 298 K at 1 bar. The network with narrow and interconnected pores in three dimensions shows greater affinity for CO compared to the network with one-dimensional and relatively large pores-attributable to the more effective interactions with the azine groups
- …